The Pennsylvania State University, Spring 2021 Stat 415-001, Hyebin Song

Probability Review

Probability ReviewElements of probabilityRandom variablesDistribution functionsExpectationVarianceSome common random variablesTwo or more random variablesDistribution functionsConditional distributionsIndependenceExpectation and CovarianceMoment generating functionsSome limit theorems

Elements of probability

$\Omega$ : The set of all the outcomes of a random experiment.

Event: $\Omega$ . In other words, a collection of some possible outcomes

Probability: $[0,1]$ as an output, which satisfies the following three properties (Axioms of Probability).

$E$ $P(E)\geq 0$ .
$P(\Omega)=1$ .
$A_1,A_2,…$ $P(\cup_i A_i)=\sum_iP(A_i)$ .

Example Tossing a fair coin twice.

$\Omega$ $\{(H,H),(H,T),(T,H),(T,T)\}$
$\{(H,H)\}$ $\{(T,H),(H,T)\}$ $\{(H,T),(H,H)\}$ , etc.

Random variables

A random variable is a variable whose outcome is determined by a random experiment.

$\Omega$ as an input and returns a real number.
$X(\omega)$ $X$ .

Events can be defined in terms of random variables:

For example, the following are events:

$\{X=x\}$ $x\in \mathbb{R}$
$\{a\leq X \leq b\}$ $-\infty<a<b<\infty$

$P(X=x)$ $P(a\leq X \leq b)$ .

More formally,

$\{X=x\} = \{\omega\in \Omega; X(\omega)=x\}$ . Similarly,
$\{a\leq X \leq b\} =\{\omega \in \Omega; a\leq X(\omega) \leq b\}$ .

$\{a\le X \le b\} = \{\frac{a-c_1}{c_2} \le \frac{X-c_1}{c_2}\le \frac{b-c_1}{c_2}\}$ $c_1,c_2$ $c_2>0$ .

Example $X$ $P(X=2)$ $P(X/2=1$ ).

$\Omega$ $\{(H,H),(H,T),(T,H),(T,T)\}$
$P(X=2) = P(\{\omega \in \Omega; X(\omega)=2\}) = P((H,H)) = 1/4$ .
$X/2=1$ $X=2$ .
$P(X/2=1) = P(X=2) = 1/4$ .

$A$ $1[A]$ $1$ $A$ $0$ otherwise.

$1[X=2]$ $1$ $X=2$ $0$ otherwise.

Class Question: $1[X=2]$ ?

Distribution functions

$X$ $F_X:\mathbb{R}\rightarrow [0,1]$ such that

F_X(t) = P(X\leq t).

It is often easier to handle the pmf or pdf than the cdf, where

$X$ is discrete,
- $p_X(t) = P(X=t)$
$X$ is continuous.
- $f_X(t)$ $B \subset \mathbb{R}$ ,
  $P(X\in B) = \int_{x\in B} f_X(x) dx$
  In particular,
  $P(t-\epsilon/2<X<t+\epsilon/2 ) = \int_{t-\epsilon/2}^{t+\epsilon/2}f_X(x)dx \approx \epsilon f_X(t)$
  $\epsilon>0$ .
- $f_X(t) = \frac{d}{dx}F_X(t)$
$\leftrightarrow$ pmf/pdf
Some pmfs or pdfs have names!

Remark $X$ $X$ .

specify $X$ $X$ .
- $X \sim F_X$ $X \sim {\rm Name(param)}$

Example

$F_X(t) = \int_{-\infty}^t \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2}s^2} ds, \,\,t\in \mathbb{R}$ .
$p_X(t) = (0.4)^t (0.6)^{1-t},\,\, t\in\{0,1\}$ .
$f_X(t) = 3e^{-3t},\,\, t>0$ .

Three main quantities of interest

$X$ $g$ , three main quantities of interest are

$P(a\leq g(X) \leq b)$
$E(g(X))$ $g(X)$
${\rm Var} (g(X))$ $g(X)$

Expectation

$X$ $p_X(t)$ $f_X(t)$ $g$ ,

$E[g(X)] = \sum_{x } g(x) p_X(x)$ $X$ discrete
$E[g(X)] = \int_{\mathbb{R}} g(x)f_X(x) dx$ $X$ continuous

Remark: $g(X)$ $g(x)$ $p_X(x)$ $f_X(x)$ .

Properties

$E[a]=a$ $a\in \mathbb{R}$ .
$E[ag(X)]=aE[g(X)]$ $a \in \mathbb{R}$ .
$E[f(X)+g(X)]=E[f(X)]+E[g(X)]$

Exercise $N$ $X = 1[N=2]$ $\{N=2\}$ $E[X^2 +3X+2]$ .

$X \sim Ber(p)$ $p = P(N=2) = .25$ $E[X^2 +3X+2] = E[X^2]+E[3X]+E[2] = E[X^2]+3E[X]+2$
$E[X^2] = \sum_{x=0}^1 x^2 p_X(x) = p_X(1) = .25$
$E[X] = \sum_{x=0}^1 xp_X(x) = p_X(1) = .25.$
$.25+.75+2 = 3$ .

Variance

$X$ $g$ ,

{\rm Var}(g(X)) = E[\{g(X) - E[g(X)]\}^2]

Remark: $g(X)$ concentrated $g(X)$ $E[g(X)]$ .

Properties

${\rm Var}[a]=0$ ${\rm Var}(g(X)+a) = {\rm Var}(g(X))$ $a\in \mathbb{R}$ .
${\rm Var(g(X))} = E[g(X)^2] - E[g(X)]^2$
${\rm Var}[ag(X)]=a^2{\rm Var}[g(X)]$ $a \in \mathbb{R}$ .
${\rm Var}[f(X)+g(X)]={\rm Var}[f(X)]+{\rm Var}[g(X)] + 2{\rm Cov}(f(X),g(X))$

Exercise $N$ $X = 1[N=2]$ ${\rm Var}[3X+2]$ .

$X \sim {\rm Ber}(.25)$ .
${\rm Var}[3X+2] = 9{\rm Var}(X) = 9(.25)(.75)$

Some common random variables

$n,p$ $\lambda$ $p$ $r,p$ ), ...
$\mu,\sigma^2$ $\chi^2(n)$ $a,b$ $a,b$ $\lambda$ $\alpha,\lambda$ $t(d)$ $F(d_1,d_2)$ , ...

Example $\mu = 150$ $\sigma^2 = 900$ $^2$ . Given that the mean of the trunk diameters of 100 randomly selected pine tree is greater than 156 cm, calculate the conditional probability that the sample mean exceeds 158 cm.

$X_i$ be the trunk diameter of the ith selected pine tree. The question is
$P(\bar{X}\ge 158 | \bar{X}\ge 156)$ .
By definition,
$P(\bar{X}\ge 158 | \bar{X}\ge 150) = \frac{P(\bar{X}\ge 156)}{P(\bar{X}\ge 150)}$
Also we have,
$\bar{X}= \frac{1}{100}(X_1+\dots X_n) \sim N(150, \frac{900}{100})$
$P(\bar{X}\ge 158) = P(\frac{\bar{X}-150}{3}\ge \frac{158-150}{3}) = 1-\Phi(8/3).$ , and
$P(\bar{X}\ge 156 )= P(\frac{\bar{X}-150}{3}\ge \frac{156-150}{3}) = 1-\Phi(2).$
$P(\bar{X}\ge 158 | \bar{X}\ge 156) = \frac{1-\Phi(8/3)}{1-\Phi(2)}$ .

Two or more random variables

A random vector is a collection of random variables.

Examples

$(X,Y)$ $2$ .
$(X_1,\dots,X_n)$ $n$ .

Distribution functions

joint (cumulative) distribution function $(X_1,\dots,X_n)$ is defined by

F(x_1,\dots,x_n)=P(X_1\leq x_1,\dots,X_n\leq x_n).

Remark: $X_1,\dots,X_n$ can be calculated.

$(X_1,\dots,X_n)$ is completely specified if we specify the joint distribution function.

$X_i$ $X_i$ ).

$p$ $p(x_1,\dots,x_n) = P(X_1=x_1,\dots,X_n=x_n)$ .
$f$ $f(x_1,\dots,x_n)$ $B \subset \mathbb{R}^n$ ,
$P((X_1,X_2,\dots,X_n)\in B) = \int\int\dots\int_{(x_1,\dots,x_n)\in B} f(x_1,\dots,x_n)d x_1 dx_2\dots dx_n$

$n=2$ ,

$p$ $p(x_1,x_2) = P(X_1=x_1,X_2=x_2)$ .
$f$ $f(x_1,x_2)$ $B \subset \mathbb{R}^2$ ,
$P((X_1,X_2)\in B) = \int\int_{(x_1,x_2)\in B} f(x_1,x_2)d x_1 dx_2$

$X_i$ (a.k.a the marginal pmf/pdf) can be obtained by summing/integrating over other variables.

$n=2$ ,

$p_{X_1}$ $p_{X_1}(x_1)=\sum_{x_2} p(x_1,x_2)$
$f_{X_1}$ $f_{X_1}(x_1) = \int_{\mathbb{R}} f(x_1,x_2)dx_2$

Conditional distributions

$(X,Y)$ . The $X$ $Y=y$ the probability distribution $X$ $Y$ $y$ .

usually $y$ .

$X$ $Y$ $X$ $Y=y$ is
$p_{X|Y} (x|y) = P(X=x|Y=y) = \frac{p(x,y)}{p_Y(y)}$
$y$ $p_Y(y)>0$ .
$X$ $Y$ $X$ $Y=y$ is
$f_{X|Y}(x|y) = \frac{f(x,y)}{f_Y(y)}$
$y$ $f_Y(y)>0$ .

Example $X$ $Y$ $X$ $Y=1$

$p(x,y) = P(X=x,Y=y)$
p(x,y) x=0 x=1
y=0 p(0,0)=0 p(1,0)=1/2
y=1 p(0,1)=1/2 p(1,1)=0
$p_Y(y) = \sum_x p(x,y)$
$p_Y(0) = \sum_x p(x,0)=p(0,0)+p(1,0) = 1/2$
$p_Y(1) = \sum_x p(x,1) = p(0,1)+p(1,1) = 1/2$
Therefore,
$p_{X|Y}(x|1) = \frac{p(x,1)}{p_Y(1)} = \begin{cases} 0 & {\rm if } \,\, x=1\\ 1 & {\rm if } \,\, x=0 \end{cases}$

p(x,y)	x=0	x=1
y=0	p(0,0)=0	p(1,0)=1/2
y=1	p(0,1)=1/2	p(1,1)=0

Independence

$(X_1,\dots,X_n)$ are (jointly) independent if

F(x_1,\dots,x_n) = F_{X_1}(x_1)\dots F_{X_n}(x_n)

$x_1,\dots,x_n \in \mathbb{R}$ . Equivalently,

$p(x_1,\dots,x_n)=p_{X_1}(x_1)\dots p_{X_n}(x_n)$ .
$f(x_1,\dots,x_n)=f_{X_1}(x_1)\dots f_{X_n}(x_n)$

Example $n$ $(X_1,\dots,X_n)$ $X_i$ $\lambda$ $(X_1,\dots,X_n)$ .

$p(x_1,\dots,x_n) = \prod_{i=1}^n p_{X_i}(x_i) = \prod_{i=1}^n \frac{e^{-\lambda}\lambda^{x_i}}{x_i!} = \frac{e^{n\lambda}\lambda^{\sum_{i=1}^n x_i}}{\prod_{i=1}^n x_i!}$

Expectation and Covariance

$X$ $Y$ $X$ $Y$ covariance $X$ $Y$ .

$X$ $Y$ , defined as

{\rm Cov}[X,Y]=E[(X−E[X])(Y−E[Y])]

Properties:

$f$ $g$ $f,g:\mathbb{R}^2 \rightarrow \mathbb{R}$ ,

$E[f(X,Y)+g(X,Y)]=E[f(X,Y)]+E[g(X,Y)]$
${\rm Var}[X+Y]={\rm Var}[X]+{\rm Var}[Y]+2{\rm Cov}[X,Y]$
${\rm Cov}(X,Y) = E[XY]-E[X]E[Y]$ .
${\rm Cov}(aX,bY) = ab{\rm Cov}(X,Y)$ $a,b\in\mathbb{R}$ .
${\rm Cov}(X,X) = {\rm Var}(X)$ .
$X$ $Y$ ${\rm Cov}[X,Y]=0$
- ${\rm Cov}(X,Y)=0$ $X$ $Y$ ${\rm Cov}(X,Y)\neq 0$ $X$ $Y$ are not independent.
$X$ $Y$ $E[f(X)g(Y)]=E[f(X)]E[g(Y)]$ .

Example $X$ $Y$ ${\rm Var}(2X+3Y)$ .

$\begin{align} {\rm Var}(2X+3Y) &= {\rm Var}(2X) + {\rm Var}(3Y) + {\rm} 2{\rm Cov}(2X,3Y)\\ &= 4{\rm Var}(X)+ 9{\rm Var}(Y) + 12{\rm Cov}(X,Y) \end{align}$
We have,
$X \sim Ber(0.5)$ $Y \sim Ber(0.5)$ ${\rm Var}(X) = {\rm Var}(Y) = 0.25$ .
${\rm Cov}(X,Y) = E[XY] - E[X]E[Y] =-0.5^2$ .
Therefore,
${\rm Var}(2X+3Y) = 4/4 +9/4 - 12/4 = 1/4$ .

Moment generating functions

$X$ moment generating function $M_X(t)$ random variable $X$ $t\in\mathbb{R}$ by

M_X(t) = E[e^{tX}].

$[Math Processing Error](X_1,\dots,X_n)$ moment generating function $M_{X_1,\dots,X_n}(t)$ random vector $(X_1,\dots,X_n)$ $t=[t_1,\dots,t_n] \in \mathbb{R}^n$ by

M_{X_1,\dots,X_n}(t) = E[e^{t_1X_1+\dots+t_nX_n}].

One of the most important results regarding moment generating functions (mgfs) is the following uniqueness theorem, which says that random variables from the same probability distribution have the same moment generating function.

Uniqueness Theorem $X$ $Y$ $M_X$ $M_Y$ $M_X(t) = M_Y(t)$ $t \in [-C,C]$ $C>0$ $X$ $Y$ have the same distribution.

Remark characterize $X$ $X$ .

$X$ $M_X(t) = e^{\lambda(e^t -1)}$ $t \in \mathbb{R}$ $M_X$ $X$ $\lambda$ .

Example $(X_1, X_2,..., X_n)$ $X_i$ $i = 1, 2,..., n$ ${\rm Poisson}(2)$ $X_i\sim {\rm Poisson}(2)$ $X_1+\dots+X_n$ .

$Z = X_1+X_2+\dots,X_n$ $Z$ .
$M_Z(t) = E[e^{tZ}] = E[e^{t(X_1+\dots+X_n)}] = E[e^{tX_1}e^{tX_2}\dots e^{tX_n}]$
$X_1,\dots,X_n$ are independent,
$M_Z(t) = E[e^{tX_1}]E[e^{tX_2}]\dots E[e^{tX_n}] = M_X(t)^n = e^{n\lambda(e^{t}-1)}$
$Z\sim {\rm Poisson}(n\lambda)$ .

Some limit theorems

Weak Law of Large Numbers (WLLN)
$X_1,\dots,X_n \sim F$ $E[X_i] = \mu, \, {\rm Var}(X_i)=\sigma^2<\infty$ $\mu,\sigma^2<\infty$ ,
$\epsilon>0$ , we have,
$P(|\bar{X}-\mu|\geq \epsilon) \underset{n\rightarrow \infty}{\rightarrow} 0.$
$n$ i.i.d random variables is close to the population mean.
Central Limit Theorem (CLT)
$X_1,\dots,X_n \sim F$ $E[X_i] = \mu, \, {\rm Var}(X_i)=\sigma^2<\infty$ $\mu,\sigma^2<\infty$ ,
$t\in\mathbb{R}$ $, we have,
$P(\frac{\bar{X}-E[\bar{X}]}{\sqrt{{\rm Var}(\bar{X})}}\le t) = P(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\le t) \underset{n\rightarrow \infty}{\rightarrow} \Phi(t) \tag{1}$
$\Phi(t)$ is a cdf of a standard normal distribution. Equivalently,
$P (\frac{S_n-E[S_n]}{\sqrt{{\rm Var}(S_n)}}\leq t)=P(\frac{S_n-n\mu}{\sqrt{n}\sigma}\le t) \underset{n\rightarrow \infty}{\rightarrow} \Phi(t) \tag{2}$

$S_n = \sum_{i=1}^n X_i$ .

In words, the distribution of the standardized sample mean or sum is close to the normal distribution.

We also write (1) and (2) as

\frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \dot\sim N(0,1) \quad \mbox{ and }\quad \frac{S_n-n\mu}{\sigma\sqrt{n}} \dot\sim N(0, 1).

or equivalently,

\bar{X} \dot\sim N(\mu, \frac{\sigma^2}{n} ) \quad \mbox{ and }\quad S_n \dot\sim N(n\mu, n\sigma^2).